This report is a compilation of my R notebooks in which I performed several analysis. As such, it will read like a lab notebook. I hope it will explain my analyses and my thinking and provide enough information for the next person to pick up this project. Please check out the source and project files for orographic_precip on my Google Drive.

Summary of Work

Motivation

Observations of the influence of topography on precipitation patterns exist throughout the world, including in the European Alps, Andes, Sierra Nevada in California, Olympic Mountains in Washington State, and Himalaya (Alpert 1986; Reiners et al. 2003; Anders et al. 2006; Anders et al. 2007). There is an undeniable relationship between topography and the spatial pattern of precipitation (Anders et al. 2006), but the direct link between precipitation and erosion is unclear (Reiners et al. 2003; Molnar 2003; D. W. Burbank et al. 2003; Dadson et al. 2003). Landscape evolution modeling enables the simulation of long-term topographic change by balancing the competing processes of uplift and erosion. Several studies have suggested the importance of spatially-varying precipitation for predicting landscape evolution (Anders et al. 2007; Han, Gasparini, and Johnson 2015). A recent advance in modeling precipitation in complex terrain includes the Linear Theory of Orographic Precipitation (LT model) (Smith and Barstad 2004) which linearizes the complex terrain using a Fourier Transform and enables rapid simulation of the processes driving orographically-enhanced precipitation.

Physical Background

A common signature of orographically-enhanced precipitation is a wet side and dry side of a mountain range (Figure 1). As a moist airmass is pushed up the windward side of the mountain, the air mass cools and water vapor condenses to water droplets. The water droplets precipitate preferentially on the windward side. The dry air mass is pushed to the leeward side, but little to no moisture is available to precipitate.

Question

  1. Can we model the spatial heterogeneity of precipitation of a mountain range?
  2. Can we incorporate orographically-enhanced precipitation in a landscape evolution model?

Study Area

I evaluated modeled precipitation in the tropical and extratropical Andes, and South Africa (Figure 2).

Figure 2. Global Mean Precipitation as estimated from the Tropical Rainfall Measurement Mission (TRMM)

Figure 2. Global Mean Precipitation as estimated from the Tropical Rainfall Measurement Mission (TRMM)

Data Sources

I used the Tropical Rainfall Measurement Mission (TRMM) as the basis for the analysis. TRMM is a satellite that was launched in 1997 with 5 instruments to measure rainfall in tropical areas. Operation ended in 2015 see NASA TRMM webpage.

I used the 2B31 product as published by Bodo Bookhagen and availabe here (see Figure 2). The 2B31 data product is a combined Precipitation Radar (PR) / TRMM Microwave Imager (TMI) rainfall intensity product that Bodo bias-corrected using precipitation gauges. This product is available at 0.04° resolution and was reprojected to a South American Albers projection at 5km resolution. See Bookhagen and Burbank (2010) for product details.

A second TRMM product used is the 3B42 which is 3-hourly precipitation rate available at 0.25° resolution from 1998-2015. I processed this on Google Earth Engine to estimate the number of hours of rainfall per year. Each 3 hour period was classified as “raining” or “not raining” and the hours of rainfall were summed per year. The temporal median of each pixel was computed for this analysis (Figure 3).

Figure 3. Global median hours of precipitation as estimated from TRMM

Figure 3. Global median hours of precipitation as estimated from TRMM

The median hours was reprojected to the same grid as the 2B31 product using bilinear resampling.

The Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) digital elevation model was downloaded and cropped for the different domains. The median Statistic, 30 arc-seconds DEM was reprojected to the same 5km grid as the TRMM products using bilinear resampling.

Method

The Linear Theory of Orographic Precipitation (LT model) is used to simulate precipitation in mountainous topography based on a DEM (Smith and Barstad (2004)). The LT model computes precipitaton intensity by estimating the condensation and evaporation processes in a saturated air mass as it rises and falls in elevation. The atmosphere is assumed to have a constant temperature profile and windspeed is constant in space and time both horizontally and verically. Water vapor condenses as the air mass rises and precipitation falls out after some delay. Time delays are used to approximate the effect of the cloud physics. As a cloud decreases in elevation, remaining water droplets may evaporate before they can fall out as precipitation.

I computed hourly precipitation rates for different mountain ranges around the world using the R package LinearTheoryOrographicPrecip available on github. I assumed a 45° wind direction for tropical and extratropical Andes and 135° wind direction for southern Africa. The remaining parameter values for the model varied based on the analysis and can be seen below. To address the first question, I used the TRMM-derived hours of precipitation per year to scale the rainfall intensity (mm/hr) to mm/yr. To address the second question I considered how to scale the rainfall intensity without the use of remote sensing.

I used the Nash-Sutcliffe (NS) coefficient to test the skill of a simulation. The NS coefficient compares the squared errors from the model with the squared error from the mean, i.e. does the model produce a result that is better than the mean of the observations? Values between 0 and 1 indicate the model is better than the mean. A value of 0 indicates the model performs the same as the mean. Values <0 indicate the mean more accurately presents the observations.

\[ NS = 1-\frac{\sum{(y_{model}-y_{obs})^2}}{\sum{(y_{model}-mean(y_{obs}))^2}} \]

What follows are six analyses that I performed to address the two science questions. Each of the analyses run independently, but the ideas build upon the former analysis. Each analysis starts with an Introduction explaining the purpose of the analysis, a Method section detailing any methods specific to the analysis, an Outcome section for a summary of results, and finally the Details of the analysis.

Analysis 1

Question

Which parameters of the LT model are the most sensitive?

Do the best parameters produce modeled precip that matches that from TRMM?

Methods

We only evaluate a subset of parameters including:

  • background precip
    • This is a spatially uniform precip intensity that is added to the results and affects the magnitude of the precip intensity
  • windspeed
    • This parameter impacts how far precip will travel
  • time delay(s)
    • condensation delay and fallout delay are assumed to be equal
    • the shorter the delays then the more quickly precip will fall to the ground once the moisture source is lifted
  • truncation value
    • the LT model produces negative and postive results. negative results are grid cells where evaporation is greater than condensation. negative precip is not realistically feasible so usually these grid cells are truncated at zero. However, I found an over abundance of zeroes produced unrealistic yearly rainfall rates so I am truncating before adding background precip. This results in a grid cells with the same precip intensity (equal to the background precip) but greater than zero.

Outcome

In general, I found the model wasn’t that sensitive to any of these parameters when compared against TRMM precip (and scaled using TRMM precip hours), and the resulting distribution of precipitation looks quite reasonable when compared against observations. Background precip was perhaps the most important. Final parameter values for other experiments were chosen to be:

  • background precip = 2 mm/hr

  • time delay(s) = 800 s (this value is partially inspired by Anders et al. 2008, Fig 1)

  • wind speed = 5 m/s

  • truncation value = 0 mm/hr

In my opinion, the model does remarkably well at capturing the overall pattern of precipitation. There are differences between the observed and modeled maps (see below) and the distributions of precip (see below), but given that the model doesn’t use any observations I was surprised.

t_delay = 800
trunc_value = 0
#winddir defined with location
windspeed = 5
bckgrd_precip=2
# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elevation_string='SAmerica'
location_string='tropicalAndes'
winddir = 45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))

#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string=elevation_string,location_string=location_string)
## Warning in sqrt(-1 * m_sqr): NaNs produced

precipdF %>% 
  # gather(source,precip,trmm:LT) %>% 
  ggplot(aes(trmm,LT))+
    geom_point()+
    geom_abline(col='red')+
    geom_density2d()+
    coord_cartesian(ylim=c(0,4000),xlim=c(0,4000))

precipdF %>% 
  gather(source,precip,trmm:LT) %>% 
  ggplot(aes(precip))+
    geom_density(aes(color=source)) + 
    coord_cartesian(xlim=c(0,4000))

precipdF %>% 
  mutate(error=LT-trmm) %>% 
  ggplot()+
  geom_point(aes(x=trmm,y=error))
## Warning: Removed 82588 rows containing missing values (geom_point).

Details

Methods

This parameter grid search takes a while so I ran everything and saved the results. Nash-sutcliffe was used to compare model and observations. Here is an example of how the experiment was setup and the top results.

# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='extratropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))


#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment 
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string,location_string)

skill_fn <- file.path(RPROJHOME,paste0('output/LT_parameter_optimization_',location_string,'_winddir-',winddir,'.rds'))
if(file.exists(skill_fn)){
  skilldF <- readRDS(skill_fn)
} else {
  paramdf <- cross_df(.l=list(bckgrd_precip=seq(1,3,.5),t_delay=seq(700,1500,100),windspeed=seq(5,5,5),winddir=winddir,trunc_value=seq(-2,0,0.5)))
  skilldF <- 
    paramdf %>% 
    mutate(
      skill=pmap_dbl(.,LToptimize,elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask,NS,Intfn)
    )
  saveRDS(skilldF,skill_fn)
}

skilldF %>%
arrange(desc(skill))
## # A tibble: 225 x 6
##    bckgrd_precip t_delay windspeed winddir trunc_value     skill
##            <dbl>   <dbl>     <dbl>   <dbl>       <dbl>     <dbl>
##  1             2    1200         5      45        -2.0 0.4051471
##  2             2    1200         5      45        -1.5 0.4051471
##  3             2    1200         5      45        -1.0 0.4051471
##  4             2    1200         5      45        -0.5 0.4051471
##  5             2    1300         5      45        -2.0 0.4049005
##  6             2    1300         5      45        -1.5 0.4049005
##  7             2    1300         5      45        -1.0 0.4049005
##  8             2    1300         5      45        -0.5 0.4049005
##  9             2    1100         5      45        -2.0 0.4042259
## 10             2    1100         5      45        -1.5 0.4042259
## # ... with 215 more rows

Results

The results are presented as pairplots. The rows and columns are the 4 parameters identified above, such that the x and y axis correspond to the values of the parameter indicated by the grey box. The main interest will be the raster images in the bottom left. The color represenst the skill score. Positive values indicate the result was better than the mean of the observations. The diagonal plots were an experiment to plot the actual skill scores agains the parameter values- they are hard to interpret so I would ignore them.

Tropical Andes

Extratropical Andes

South Africa

Combined results

Just for fun, here are the top performing models.

region bckgrd_precip t_delay windspeed winddir trunc_value skill
tropicalAndes 1.5 1500 5 45 -2.0 0.6055060
tropicalAndes 1.5 1500 5 45 -1.5 0.6055060
tropicalAndes 1.5 1500 5 45 -1.0 0.6055060
tropicalAndes 1.5 1500 5 45 -0.5 0.6055060
tropicalAndes 2.0 1500 5 45 -2.0 0.6021763
tropicalAndes 2.0 1500 5 45 -1.5 0.6021763
tropicalAndes 2.0 1500 5 45 -1.0 0.6021763
tropicalAndes 2.0 1500 5 45 -0.5 0.6021763
tropicalAndes 1.5 1400 5 45 -2.0 0.5971641
tropicalAndes 1.5 1400 5 45 -1.5 0.5971641
tropicalAndes 1.5 1400 5 45 -1.0 0.5971641
tropicalAndes 1.5 1400 5 45 -0.5 0.5971641
tropicalAndes 2.0 1400 5 45 -2.0 0.5965794
tropicalAndes 2.0 1400 5 45 -1.5 0.5965794
tropicalAndes 2.0 1400 5 45 -1.0 0.5965794
tropicalAndes 2.0 1400 5 45 -0.5 0.5965794
tropicalAndes 2.0 1300 5 45 -2.0 0.5896795
tropicalAndes 2.0 1300 5 45 -1.5 0.5896795
tropicalAndes 2.0 1300 5 45 -1.0 0.5896795
tropicalAndes 2.0 1300 5 45 -0.5 0.5896795
southafrica 1.5 800 5 135 0.0 0.4425294
southafrica 1.5 900 5 135 0.0 0.4419318
southafrica 1.5 1000 5 135 0.0 0.4412918
southafrica 1.5 1100 5 135 0.0 0.4406361
southafrica 1.5 1200 5 135 0.0 0.4399860
southafrica 1.5 1300 5 135 0.0 0.4393507
southafrica 1.5 1400 5 135 0.0 0.4387423
southafrica 1.5 1500 5 135 0.0 0.4381648
extratropicalAndes 2.0 1200 5 45 -2.0 0.4051471
extratropicalAndes 2.0 1200 5 45 -1.5 0.4051471
extratropicalAndes 2.0 1200 5 45 -1.0 0.4051471
extratropicalAndes 2.0 1200 5 45 -0.5 0.4051471
extratropicalAndes 2.0 1300 5 45 -2.0 0.4049005
extratropicalAndes 2.0 1300 5 45 -1.5 0.4049005
extratropicalAndes 2.0 1300 5 45 -1.0 0.4049005
extratropicalAndes 2.0 1300 5 45 -0.5 0.4049005
extratropicalAndes 2.0 1100 5 45 -2.0 0.4042259
extratropicalAndes 2.0 1100 5 45 -1.5 0.4042259
extratropicalAndes 2.0 1100 5 45 -1.0 0.4042259
extratropicalAndes 2.0 1100 5 45 -0.5 0.4042259
extratropicalAndes 2.0 1400 5 45 -2.0 0.4037974
extratropicalAndes 2.0 1400 5 45 -1.5 0.4037974
extratropicalAndes 2.0 1400 5 45 -1.0 0.4037974
extratropicalAndes 2.0 1400 5 45 -0.5 0.4037974
extratropicalAndes 2.0 1500 5 45 -2.0 0.4020013
extratropicalAndes 2.0 1500 5 45 -1.5 0.4020013
extratropicalAndes 2.0 1500 5 45 -1.0 0.4020013
extratropicalAndes 2.0 1500 5 45 -0.5 0.4020013
southafrica 1.5 1500 5 135 -3.0 0.3945932
southafrica 1.5 1500 5 135 -2.5 0.3945932
southafrica 1.5 1500 5 135 -2.0 0.3945932
southafrica 1.5 1500 5 135 -1.5 0.3945932
southafrica 1.5 1500 5 135 -1.0 0.3945932
southafrica 1.5 1500 5 135 -0.5 0.3945932
southafrica 1.5 1400 5 135 -3.0 0.3936591
southafrica 1.5 1400 5 135 -2.5 0.3936591
southafrica 1.5 1400 5 135 -2.0 0.3936591
southafrica 1.5 1400 5 135 -1.5 0.3936591
southafrica 1.5 1400 5 135 -1.0 0.3936591
southafrica 1.5 1400 5 135 -0.5 0.3936591
southafrica 1.5 1300 5 135 -3.0 0.3925315
southafrica 1.5 1300 5 135 -2.5 0.3925315
southafrica 1.5 1300 5 135 -2.0 0.3925315
southafrica 1.5 1300 5 135 -1.5 0.3925315
southafrica 1.5 1300 5 135 -1.0 0.3925315
southafrica 1.5 1300 5 135 -0.5 0.3925315

Analysis 2

Introduction

This analyis was motivated by the desire to include observed (or estimated based on some knowledge of climate) precipitation variability.

If we know mean and variability of precipitation, how can we parameterize background precipitation in LTmodel?

Method

We compute stochastic precipitation timeseries for the year. We assume gamma distributed precip amount around some mean and the storm events are exponentially distributed. The precipitation function is based on the stochastic precip generator in Landlab

The mean from the stochasitc precipitation timeseries is used for the background precipitation in the LT model.

Outcome

It is possible to match the mean observed yearly precipitation using a background precipitation stochastically generated. I did not look at any data to determine what reasonable even duration, interevent duration or storm depth should be, but the values shown below reproduce observed precip reasonably well.

Details

Stochastic Storms

First, define some mean attributes of our storms:

#This is a very rainy place!
avg_event_duration = 48#hours
avg_interevent_duration = 36#hours
avg_storm_depth = 50 #mm
total_t=365*24 #total duration of timeseries in hours

Estimate the timeseries and plot the result:

ggplot(precip_timeseries)+
  geom_col(aes(x=storm_start,y=intensity))+
  labs(x='storm event [hrs since Jan 1 00:00]',
       y='intensity [mm/hr]')

Here are the precipitation statistics for the year:

Linear Theory Model

We need to define some parameters for the LT model. We will use annual intensity from the stochastic storms as the background precipitation.

location_string='tropicalAndes'
elev_extratrop <- raster(file.path(RPROJHOME,paste0('data/elev/SAmerica/elev_',location_string,'_albers.tif')))
winddir = 45
bckgrd_precip = annual_precip_stats$annual_intensity
t_delay = 800
trunc_value = 0
#winddir defined above with location
windspeed = 5

Estimate Distributed Annual Precip

We’ll scale the LT model output by the number of observed hours of precip from TRMM. Then we can compare the mm/yr from TRMM to our modeled precip distribution.

Here are the hours of precipitation as observed by TRMM:

Here are the hours of precip observed by TRMM cropped to the tropical Andes:

mtn_ext=raster(file.path(RPROJHOME,paste0('gis/mtnmask_tropicalAndes.tif')))
mtn_ext <- projectRaster(mtn_ext,trmm_hours) %>% trim()
trmm_hours_mtn <- crop(trmm_hours,mtn_ext)

data(World)
plot(trmm_hours_mtn)
world_mtn <- spTransform(World[World$continent=='South America',],projection(trmm_hours_mtn))
crop(world_mtn,trmm_hours_mtn) %>% plot(add=T)

# MASS::fitdistr(raster::getValues(trmm_hours_mtn),'gamma')

To estimate the annual precip based on the LT model, I multiply the LT result by annual observed stormhours.

LTp_annual <- LTprecip * trmm_hours

Validation

TRMM comparison

Here you can see a larger extent than in the previous analysis and broadly speaking the results from the flatter areas exhibit the same trend as the observations. Higher precip is observed in the north trending to lower precip in the east.

Here we calculated the NS coefficient between the modeled and observed precip for the larger area. The skill is positive which means the distribution is better than the mean (barely). We saw previously that a background precipitation of 2 mm/hr worked well in most cases so unless you have good reason to produce precipitation stochastically it’s probably not worth it.

## [1] 0.1766396

Nonetheless, the spatial mean precip agree quite well between the observation and modeled precip.

The mean precip observed by TRMM is 1212.493716.

The mean precip from the LT model is 1120.7422351.

We can zoom into the mountains region to compare the distributions visually. We see good agreement in the location of the dark bands (indicating high precip), and these light on the windward side of the ridges.

Next we can check the errors between the modeled precip and observed precip.

A scatter plot of modeled vs observed shows there to be some deviation at low observations but generally the model and observations follow the 1:1 line. The blue lines show the density of points.

precipdF %>% 
  # gather(source,precip,trmm:LT) %>% 
  ggplot(aes(trmm,LT))+
    geom_point()+
    geom_abline(col='red')+
    geom_density2d()+
    coord_cartesian(ylim=c(0,4000),xlim=c(0,4000))

Here is the probability density function for the observed and modeled precip. The general shape is captured by the model but there is some deviation at the higher end of values.

Analysis 3

Introduction

The goal of this notebook is to determine if there is a single value of hours of precipitation that can scale the LT model hourly rainfall intensity to yealry rainfall. The idea is to replace the spatial field of precip hours from TRMM with a value that can be used if no observations are available.

Method

Multiple scalar values are tested to get a sense of how certain the value of hours can be. Nash-sutcliffe coefficient is used to check the yearly output against the mean of the observations from TRMM.

Outcome

As you can see at the bottom, there is no single value that scales hourly precip intensity to yearly precip rate better than using the mean observed precip. In fact, 0 hours provides the best estimate. Odd I think

Details

# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='extratropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))


#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string,location_string)

The median hours of precip is 436.5 hours.

# trmm_avghrs_mtn <- setValues(trmm_hours_mtn,avg_hrs) %>% mask(trmm_hours_mtn)
trmm_medhrs_mtn <- setValues(trmm_hours_mtn,med_hrs) %>% mask(trmm_hours_mtn)
#
trmm_hrs_list=list(trmm_hours_mtn,trmm_medhrs_mtn)

# Set up an array of hours to test.
hrs_vector=c(med_hrs,seq(0,500,100))
trmm_hrs_list <- lapply(hrs_vector,function(x) setValues(trmm_hours_mtn,x) %>% mask(trmm_hours_mtn))


paramdf <- cross_df(.l=list(bckgrd_precip=2,t_delay=800,windspeed=5,winddir=winddir,trunc_value=0,trmm_hours_raster=trmm_hrs_list))
skilldF <- 
    paramdf %>% 
    mutate(
      hrsprecip=hrs_vector,
      skill=pmap_dbl(.,LToptimize,elev_raster=elev,trmm_obs_raster=trmm_precip_mtn,mtn_mask=mtn_mask,NS,Intfn)
    )
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
skilldF %>%
arrange(desc(skill))

Analysis 4

Introduction

The idea of this notebook is to test whether the hourly precip intensity can be scaled to a yearly rainfall without remote sensing.

Method

Two different scaling methods are tested:

  • scale to mean (in hindsight this is the same as the previous analysis where I tested a scalar hour value)
  • scale to min and max

The nash-sutclife coefficient is used to test the modeled distribution to the mean of the observations.

Outcome

In all cases, scaling the hourly intensity to a yearly precip distribution was worse than the mean observed precipition. In general, the 2 parameter scaling performed better than 1 parameter scaling.

Details

This test was performed for several different domains with similar results. here is an example:

# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='extratropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))


#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string,location_string)

Min-Max scaling

We scaled with an assumed min and max. Min is always 0 so really we are scaling the maximum value and applying some uncertainty. I scaled with the following function:

\[ \frac{(max_{model}-min_{model})(x-min_{assumed})}{max_{assumed}-min_{assumed}} + min_{model} \]

min_precip_trmm <- cellStats(trmm_precip_mtn,'min')
max_precip_trmm <-  cellStats(trmm_precip_mtn,'max')
frac_vector <- seq(0.5,1.5,0.1)
assumed_max <- max_precip_trmm*frac_vector
assumed_min <- min_precip_trmm*frac_vector
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"

Mean scaling

I scaled with the mean and applied some uncertainty. This is in effect the same as the previous analysis where different scalar hours were tested. The mean scaling was performed by multiplying by the ratio of assumed mean and the modeled mean:

\[ x*\frac{mean_{assumed}}{mean_{model}} \]

# suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
# preprocess_rasters(elev_string,location_string)

avg_precip_trmm <- cellStats(trmm_precip_mtn,'mean')
frac_vector <- seq(0.5,1.5,0.1)
assumed_avg <- avg_precip_trmm*frac_vector
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"

Analysis 5

Introduction

This notebook examines the relationshp between the hours of precip from TRMM to some terrain variables including relief, slope, and aspect. The hours of precip is so far critical to converting hourly rainfall intensity to yearly rainfall with any success. This notebook trieds to find an alternative parameterization for the number of precipitation hours in each grid cell using some basic terrain variables. Since precipitation is orographically-enhanced, it might be that these orographically-driven processes also affect the duration of precip.

Method

Relief is defined as the maximum elevation difference within a 3x3 pixel window. This is implemented in GDAL as ‘roughness’ (and calculated below using the raster::terrain(). With a 5km resolution DEM, the relief is at the 15km scale.

Slope and Aspect are computed in radians.

The machine learning algorithm randomForest is known for estimating non-linear relationships with minimal overfitting. This was used to see if there is a quantifable relationship between hours of precip and some basic terrain variables (relief, slope, aspect). Such a relationship would facilitate a parameterization of hours of precip to get over the hurdle of scaling hourly LT intensity to yearly rainfall rate.

Outcome

The results of this analysis suggest that the relationship between hours of precip and these terrain variables is weak. I do not try to scale LT intensity to yearly rainfall with this parameterization because the explained variance is only 11%.

Details

I chose to look at tropical Andes first:

We compute the rainfall intensity using a set of parameters from previous analyses.

LTintensity <- LTintensity(bckgrd_precip=2,t_delay=800,windspeed=5,winddir=winddir,trunc_value=0, Intfn=Intfn, elev_raster = elev, mtn_mask=mtn_mask)

Now we can look at the LT intensity, roughness, and hours of precip with pairplots.

Here are some pairplots of the variables. Only LT and relief show any potential for a linear relationship which is not unexpected and not particularly helpful.

In an effort to model hours from TRMM, I looked to see if relief, slope, and aspect could be combined nonlinearly in a randomForest model. I’m not using northness because that is just a linear combination of slope and aspect.

rowsample <- sample(seq_len(nrow(dF)), .3*nrow(dF))

dF_rf <- 
  dF %>% 
  mutate(trmmhours=scale(trmmhours),
         relief=scale(relief),
         slope=scale(slope),
         aspect=scale(aspect))

hours_rf <- 
  randomForest::randomForest(
    x=as.matrix(dF_rf[rowsample,c('relief','slope','aspect')]),
    y=as.numeric(dF_rf[rowsample,]$trmmhours),
    mtry=2,
    na.action=na.rm
  )
                            
hours_rf
## 
## Call:
##  randomForest(x = as.matrix(dF_rf[rowsample, c("relief", "slope",      "aspect")]), y = as.numeric(dF_rf[rowsample, ]$trmmhours),      mtry = 2, na.action = na.rm) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##           Mean of squared residuals: 0.8947818
##                     % Var explained: 10.49

The results are not very promsing with a pseudo-r**2 of only .11. A quick look at the observed vs predicted values below suggests mostly a scatter shot of predictive ability. It is difficult to generalize a model like this to any DEM so I won’t pursue this any further.

xtest=as.matrix(dF_rf[-rowsample,c('relief','slope','aspect')])
ytest=as.numeric(dF_rf[-rowsample,]$trmmhours)
cen=attr(dF_rf$trmmhours,which = 'scaled:center')
sca=attr(dF_rf$trmmhours,which='scaled:scale')
plot(ytest*sca+cen, predict(hours_rf,newdata=xtest)*sca+cen)

Analysis 6

Introduction

The goal of this notebook is to explore whether precipitation can be paramterized in swaths longitudinal to the wind. This approach is inspired by Bodo’s 2010 paper looking at the hydrologic budget in the Himalaya (see Figure 6) It makes sense because there could be substantial differences in weather patterns along a mountain range.

Method

I created 100 km swaths oriented in the direction of the wind (specific to the domain). Swaths less than 100km wide are quite computation intensive and harder to visualize. For each swath, I averaged the variables across the swath (perpendicular to the wind) to be able to plot profiles for each swath as well as fit a non-linear statistical model for precip and precip hours.

For the statistical model I used a generalized additive model (GAM) because it allows you to see how the relationship between independent and dependent variables changes. Each model is fit on a random subset of the data and then tested on the other part of the data.

Outcome

The statistical models performed better than the randomForest previously. Although these analyses are not directly comparable (different statistical models, different r2 formula), they suggest that looking at the swath profiles in the direction of wind has more potential for parameterizing precip hours or precip than broadly computing and scaling the LT model.

This outcome corresponds with my conversation with Bodo, i.e. relief and profile distance may be the best way to paramterize precip. Figure 6 in Bookhagen and Burbank, 2010 also points to this conclusion.

I ran out of time to continue this, but I think it would be worthwhile to compare models for different domains to see how the parameter coefficients are similar (or not).

Details

I’ve primarily done this in the tropical Andes.

# Define your location. Most locations these strings will be the same but for South America, I use the same DEM for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='tropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))

#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours,trmm_precip))
preprocess_rasters(elev_string,location_string)

roughness=terrain(elev,opt='roughness')

Choose a swath width. Bodo uses 50km (I think) but the smaller they are then the longer the analysis takes.

TODO:

  1. The mean should really be taken perpendicular to the swath instead of vertically along the y axis.

  2. These profiles should be shifted so that x=0 on the graph corresponds to the highest point in the swath. It would be easier to view but my analytica abilities are failing me at the moment.

swath_width=300000

Let’s view the distribution of precip and the hours of precip across all swaths and at all distances along the profile. Both are reasonable normally distributed except 0 truncation of course.

hist(swath_spread$trmm_precip)

hist(swath_spread$trmm_hours)

Generalized Additive Models (GAM) to predict yearly precip amount and precip hours

We can also fit a generalized additive model to predict different variables. By fitting these models to swath averages you’ve hopefully reduced some of the small scale spatial variability. If you use the by= argument inside s() you can view the coefficients as they change for different values of the variables. Dividing by the mean let’s you compare coefficients for variables of different magnitudes.

Precip

Both relief and elevation increase in impact as the values get higher but relief influences the result an order of 2 greater than elevation. Where you are in the profile also has an important influence but it’s relatively consistent. I think the oscillations are because the profiles start at the raster edge and the mountain range curves. As mentioned earlier, you should shift the profiles so that x=0 is at the highest elevation. I think x may represent the depletion of moisture from the air mass as it is pushed over the mountain range.

gam_precip <- gam(trmm_precip~s(elev,by=elev)+s(relief,by=relief)+s(x,by=x),data=swath_spread[rowsample,], link=gaussian())

extract_splines=function(mdl){
  sterms=predict(mdl,type='terms')
  datplot=cbind(sterms,mdl$model) %>% tbl_df
  datplot$intercept=attr(sterms,'constant')
  
  datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
  return(datplot)
} 

pseudoR2 <- function(yobs,yhat){
  mse <- mean((yhat-yobs)^2, na.rm=T)
  1-mse/var(yobs)
}

yhat=predict(gam_precip, swath_spread[-rowsample,], type='response')
r2=pseudoR2(swath_spread[-rowsample,]$trmm_precip,yhat)

The pseudo r2 for this model is 0.3637268.

termdF=extract_splines(gam_precip)
ggplot(termdF)+
    geom_line(aes(x=relief,y=`s(relief):relief`/mean(relief)))

ggplot(termdF)+
    geom_line(aes(x=elev,y=`s(elev):elev`/mean(elev)))

ggplot(termdF)+
  geom_line(aes(x=x,y=`s(x):x`/mean(x)))

Some diagnostic plots. Overall the model is fairly well behaved with gaussian residuals and a linear response-fitted relationship.

gam.check(gam_precip)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 21 iterations.
## The RMS GCV score gradient at convergence was 0.1226032 .
## The Hessian was positive definite.
## Model rank =  31 / 31 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                     k'   edf k-index p-value    
## s(elev):elev     10.00  9.86    0.99    0.28    
## s(relief):relief 10.00  9.91    0.90  <2e-16 ***
## s(x):x           10.00  9.99    0.97    0.09 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Precip Hours

We again see relief to have the biggest influence by an order of 2-4 compared to elevation, but the standardized coefficients are relatively consistent across all ranges of relief. Elevation coefficients have large oscillations that I’m not sure how to interpret. Possibly also related to the profile shifting?

gam_hours <- gam(trmm_hours~s(elev,by=elev)+s(relief, by=relief)+s(x, by=x),data=swath_spread, link=gaussian())


yhat=predict(gam_hours, swath_spread[-rowsample,], type='response')
r2=pseudoR2(swath_spread[-rowsample,]$trmm_hours,yhat)

The pseudo r2 for this model is 0.3011913.

termdF=extract_splines(gam_hours)
ggplot(termdF)+
    geom_line(aes(x=relief,y=`s(relief):relief`/mean(relief)))

ggplot(termdF)+
    geom_line(aes(x=elev,y=`s(elev):elev`/mean(elev)))

ggplot(termdF)+
  geom_line(aes(x=x,y=`s(x):x`/mean(x)))

More diagnostics. residuals look good, qq plot looks really good.

gam.check(gam_hours)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 18 iterations.
## The RMS GCV score gradient at convergence was 0.02331009 .
## The Hessian was positive definite.
## Model rank =  31 / 31 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                     k'   edf k-index p-value    
## s(elev):elev     10.00  9.98    1.00    0.47    
## s(relief):relief 10.00  9.92    0.86  <2e-16 ***
## s(x):x           10.00 10.00    0.76  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusions and Next Steps

These analyses suggest that scaling the LTmodel (mm/hr) to real world precipitation (mm/yr) works quite well if you have a spatial map of the hours of precipitation per year. The model is not very sensitive to its parameters, presumably because the spatial heterogeneity of the precip hours has a larger impact. It is not trivial to scale the hourly intensity to a yearly rainfall rate without remote sensing because each grid cell evidently scales differently. Using the observed hours of precipitation, the modeled yearly precip can very closely approximate the observed yearly mean precip and distribution of precip (Analyses 1, 2). Importantly, you can not just scale the modeled distribution to have the same mean or min/max as the observed distribution and expect a realistic distribution (Analyses 3, 4). It is currently not feasible to use the LT model for simulating precip in a landscape evolution model because the spatially variable scaling is required to convert the precip intensity (mm/hr) to mm/yr.

One solution might be to parameterize the hours of precip using topographic features, but my initial analyses for this did not yield great results for an entire domain (Analysis 5). Evaluating the precipitation in 100km-500km swaths parallel to the wind direction provides the most promise for parameterizing precipitation (Analysis 6). This analysis should be extended to other regions to test parameter stability of a statistical model.

Additional Considerations

  • Another precip dataset should be used to confirm the results presented here. The only other precip product I can think of with high enough resolution to possibly capture orographically-enhanced precip globally is WorldClim. In the U.S. you could look at PRISM too.

References

Alpert, P. 1986. “Mesoscale Indexing of the Distribution of Orographic Precipitation over High Mountains.” Journal of Climate and Applied Meteorology 25 (4). American Meteorological Society: 532–45. doi:10.1175/1520-0450(1986)025<0532:MIOTDO>2.0.CO;2.

Anders, Alison M, Gerard H Roe, Dale R Durran, and Justin R Minder. 2007. “Small-Scale Spatial Gradients in Climatological Precipitation on the Olympic Peninsula.” Journal of Hydrometeorology 8 (5). American Meteorological Society: 1068–81. doi:10.1175/JHM610.1.

Anders, Alison M, Gerard H Roe, Bernard Hallet, David R Montgomery, Noah J Finnegan, and Jaakko Putkonen. 2006. “Spatial patterns of precipitation and topography in the Himalaya.” In Special Paper 398: Tectonics, Climate, and Landscape Evolution, 398:39–53. Geological Society of America. doi:10.1130/2006.2398(03).

Bookhagen, B., and Douglas W Burbank. 2010. “Toward a complete Himalayan hydrological budget: Spatiotemporal distribution of snowmelt and rainfall and their impact on river discharge.” Journal of Geophysical Research 115 (F3): F03019. doi:10.1029/2009JF001426.

Burbank, D. W., A. E. Blythe, J. Putkonen, B. Pratt-Sitaula, E. Gabet, M. Oskin, A. Barros, and T. P. Ojha. 2003. “Decoupling of erosion and precipitation in the Himalayas.” Nature 426 (6967): 652–55. doi:10.1038/nature02187.

Dadson, Simon J., Niels Hovius, Hongey Chen, W. Brian Dade, Meng-Long Hsieh, Sean D. Willett, Jyr-Ching Hu, et al. 2003. “Links between erosion, runoff variability and seismicity in the Taiwan orogen.” Nature 426 (6967): 648–51. doi:10.1038/nature02150.

Han, Jianwei, Nicole M Gasparini, and Joel P L Johnson. 2015. “Measuring the imprint of orographic rainfall gradients on the morphology of steady-state numerical fluvial landscapes.” Earth Surface Processes and Landforms 40 (10): 1334–50. doi:10.1002/esp.3723.

Molnar, Peter. 2003. “Nature, nurture and landscape.” Nature 426 (December): 11–13. doi:10.1038/426612a.

Reiners, Peter W, Todd A Ehlers, Sara G Mitchell, and David R Montgomery. 2003. “Coupled spatial variations in precipitation and long-term erosion rates across the Washington Cascades.” Nature 426 (6967): 645–47. doi:10.1038/nature02111.

Smith, Ronald B., and Idar Barstad. 2004. “A Linear Theory of Orographic Precipitation.” Journal of the Atmospheric Sciences 61 (12): 1377–91. doi:10.1175/1520-0469(2004)061<1377:ALTOOP>2.0.CO;2.